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1.  INTRODUCTION. 

Let  X  =  (Xi,...,X„)  have  distribution  Po,  where  the  unknown  parameter  varies  in  0. 
Suppose  that  we  need  to  estimate  a  real  valued  function  <f>(6)  of  the  parameter.  Let  <f>  =  ftX) 
be  a  biased  estimator  of  <f>.  There  exist  several  procedures  for  reducing  the  bias  of  ft.  jackknifing, 
bootstrapping  (see  Efron  (1982)),  and  other  procedures  based  on  expansions  of  Eo(ft(see  Cox 
and  Hinkley  (1974,  Section  8.4)).  These  procedures  may  not  eliminate  the  bias  completely,  and 
one  often  hears  the  following  suggestion.  Let  ft1)  be  obtained  from  by  one  of  these  bias- 
reduction  procedures.  If  ft1)  is  still  biased,  repeat  the  bias-reduction  procedure  and  obtain 
ft2),  ft3)  etc.  until  a  desired  amount  of  reduction  in  bias  is  obtained  or  the  bias  is  removed 
completely.  Such  “higher-order  bias  corrections”  are  described  for  instance  in  the  review  paper  of 
Miller  (1974)  in  connection  with  the  jackknife.  There  are  examples  where  no  unbiased  estimator 
of  <f>  exists  but  there  exists  a  sequence  of  estimators  ft  ft1)  ,ft3),. . .  whose  biases  converge  to  zero 
(see  Section  2). 

The  purpose  of  this  note  is  to  show  (Theorem  1)  that  when  no  unbiased  estimator  of  (f>  exists, 
then  reducing  the  bias  to  zero  necessarily  forces  the  variance  of  the  estimators  to  tend  to  oo. 
This  theorem  therefore  gives  qualitative  support  to  the  widely  held  view  that  bias  reduction  is 
by  itself  not  a  desirable  property,  but  becomes  desirable  only  if  it  can  be  demonstrated  that  it  is 
accompanied  by  a  reduction  in  mean  squared  error. 

2.  MAIN  RESULT  AND  REMARKS. 


Let  (X,  S)  be  a  measurable  space  and  ( P0,6  f  0)  be  a  family  of  probability  measures  on 
(X,  S).  Let  <t>  be  a  real  valued  function  defined  on  0.  The  bias  of  an  estimator  T  =  T(X)  is 
defined  by  Pt{0)  —  Eo{T{X ))  -  <f>[0 ),  assuming  that  Eo[T[X))  exists. 

THEOREM  1.  Suppose  that 
(Al)  P0l  <  Po ,  for  all  in  ©> 

(A2)  <  00  for  aI1  *i.*2  in  0, 

and  that  is  a  sequence  of  estimators  for  which 

(1)  (hk{0)  “ *  0  for  all  0  in  0. 

If  there  does  not  exist  an  unbiased  estimator  of  <f>  then 

(2)  Varo(7fc)  — *  oo  as  k  — »  oo,  for  all  0  t  0. 


Proof:  Suppose  that  (2)  is  not  true.  Then  there  exists  a  $o  in  0  and  a  subsequence  {A:*}  of  {k} 
such  that  Var0o(Tfc»)  is  bounded.  Now,  consider  the  usual  Hilbert  space  Ho0  =  L2(X ,  S,Po„)  of 
all  functions  that  are  square-integrable  with  respect  to  Poo-  Notice  that  {T*.}  is  a  norm-bounded 
set  in  Hg0  .  From  the  sequential  weak-compactness  of  norm-bounded  sets,  there  exists  a  T  in  Ho0 
and  a  subsequence  {&**}  of  {&*}  such  that  — ►  T  weakly  in  He0  along  the  subsequence  { k **}, 
i.e. 

J  Tk-'fdPo0  — »  J  TfdPg0  for  every  function  /  in  II o0. 

In  particular,  setting  /  =  dPo/dPo0,  we  get 

E0[Tic”)  -*  Eo{T), 

along  the  subsequence  {&**},  for  all  6  in  0.  From  (l),  it  now  follows  that  Eq[T )  =  <f>(0),  that  is  T 
is  unbiased  for  (j> ,  which  contradicts  one  of  our  assumptions.  Hence  (2)  holds  and  the  proof  is  com¬ 
plete.  I 

There  are  many  examples  of  situations  to  which  this  theorem  applies.  One  class  can  be 
obtained  from  the  idea  of  the  following  example.  Consider  the  family  of  Poisson  distributions 
with  parameter  A  with  A  >  0.  It  is  well  known  that  there  exists  no  unbiased  estimator  of  1/A, 
and  that  all  polynomials  in  A  are  unbiasedly  estimable.  From  (a  slight  modification  of)  the 
Stone-Weirstrass  theorem,  there  exists  a  sequence  of  polynomials  in  A  which  converge  to  1/A  for 
each  A.  Thus  there  exists  a  sequence  of  estimators  which  are  unbiased  for  these  polynomials 
in  A,  and  whose  biases  in  estimating  l/A  converge  to  zero.  A  simple  calculation  shows  that 
=  exp(A2  —  2Aj  +  \2I\2).  Thus  Theorem  1  applies  to  this  case  and  the  variances 
of  these  estimators  must  tend  to  oo. 

It  may  appear  that  Theorem  1  does  not  apply  to  estimates  based  on  the  jackknife,  since  the 
“delete-one’'  jackknife  can  be  formed  only  a  finite  number  of  times.  However,  a  situation  with  an 
infinite  sequence  of  estimators  based  on  the  jackknife  arises  in  the  following  example,  based  on  an 
idea  of  Gaver  and  Hocl  (1970).  Suppose  that  the  data  consists  of  a  Poisson  process  { N{t)\  t  t  [0, 1  ]} 
with  rate  A.  In  connection  with  the  biased  maximum  likelihood  estimator  <f>  ~  c — A A" ( 1 '  of  c~x, 
Gaver  and  Hoe!  suggest  splitting  the  interval  [0, 1]  into  n  nonoverlapping  intervals  each  of  length 
1/n,  and  letting  Ni  be  the  number  of  events  in  the  ith  interval.  These  are  independent  and 
identically  distributed  and  one  can  therefore  form  the  delete-one  jackknife  as  usual.  This  yields, 
for  each  n,  an  estimate  <£(„)  and  they  show  that  as  n  — >  oo  <^(u)  converges  to  an  estimate  d**11 


which  depends  on  the  Poisson  process  only  through  the  sufficient  statistic  N(  1).  This  procedure 
can  be  repeated  indefinitely  in  principle,  giving  a  sequence  of  estimators 
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